巴西专利BR112016007142B1 Method for recovering from a computer system failure in which the contents of volatile memory are lo

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
METHOD FOR RECOVERING FROM A COMPUTER SYSTEM FAILURE IN WHICH THE CONTENTS OF A VOLATILE MEMORY IS LOST. The invention concerns the failure recovery of a computing system in the context in which the computing system includes a volatile memory that has contents lost due to failure, a non-volatile buffer memory that (along with the volatile memory) contributes to the memory active computing system, and non-volatile storage. Recovery occurs by identifying pages that were in volatile memory at the time of failure. For each of these pages, retrieval determines whether to retrieve the page to volatile memory from the nonvolatile buffer or storage, and then perform the retrieval. In embodiments where access speeds for nonvolatile buffer memory can be much faster than for storage, and where many of the pages to be retrieved are retrieved from nonvolatile buffer memory, the retrieval time can be reduced. .
公开号:BR112016007142B1
申请号:R112016007142-5
申请日:2014-10-27
公开日:2022-01-25
发明作者:Dexter Paul Bradshaw；Pedro Celis
申请人:Microsoft Technology Licensing, Llc；
IPC主号:

专利说明:

BACKGROUND
[001] Applications often use volatile memory to operate efficiently. During operation, data is read from the mechanical disk into memory and potentially also written back to the mechanical disk in separately sized components called "pages". A group of temporary memories is memory used to cache blocks of memory (such as pages) as the blocks are being read from the mechanical disk, or being modified in memory. The buffer pool improves performance by allowing data to be accessed from memory rather than from the mechanical disk. Just as an example, databases often use temporary memory groups in order to manage and index pages in memory.
[002] As a system operates, the system will randomly access a set of work pages. Over time, as the system operates, the working page set will change, which often also leads to a change in the size of the working page set. If the set of working pages is larger than the set of temporary memories available in random access memory (RAM), then the system performs more access operations from the mechanical disk.
[003] A mechanical disk is structured with a mechanical rotational magnetic medium in which a disk head scans a magnetic disk to read and access data. Sequential reads/writes are more efficient because they do not involve a mechanical scan of the disk head, but merely involve delaying electronic transmission from the disk head and controller circuitry to memory. Thus, mechanical disk operations are much more efficiently used for sequential operations, while mechanical disk random access operations can significantly reduce system performance. As the working page set becomes larger than the buffer pool, so the pages have to be removed from the buffer pool and written to disk using random access operations. Consequently, as the working set becomes larger than the buffer pool, system performance degrades. BRIEF SUMMARY
[004] At least one embodiment described in this document relates to failure recovery of a computing system that includes a volatile memory that has contents lost due to failure, a non-volatile buffer memory that, together with the volatile memory, contributes to the active computing system memory, and non-volatile storage. Recovery occurs by identifying pages that were in volatile memory at the time of failure. For each of these pages, retrieval determines whether to retrieve the page into volatile memory from the nonvolatile buffer or storage, and then perform the retrieval. In some embodiments in which the computing system is transaction-enabled, recovery also identifies transactions that were active at the time of the failure, and undoes the actions of each of those transactions.
[005] Access speeds for non-volatile intermediate memory can be much faster than for storage such as disk or spin storage. For example, non-volatile buffer memory can be storage class memory (SCM) such as a solid state disk (SSD). Thus, in cases where many of the pages to be retrieved are retrieved from non-volatile buffer memory, relative to storage, the retrieval time can be reduced, perhaps by a large amount.
[006] This summary is not intended to identify key aspects or essential aspects of the claimed subject, nor is it intended to be used as an aid in determining the scope of the claimed subject. BRIEF DESCRIPTION OF THE DRAWINGS
[007] In order to describe the manner in which the aforementioned advantages and other advantages and aspects can be obtained, a more particular description of various embodiments will be provided by reference to the accompanying drawings. Understanding that these drawings represent illustrative embodiments only and, therefore, are not to be considered as limiting the scope of the invention, the embodiments will be described and explained with specificity and additional detail through the use of the accompanying drawings, in which:
[008] Figure 1 illustrates a computing system in which some embodiments described in this document can be employed;
[009] Figure 2 illustrates a memory hierarchy that includes a volatile memory, non-volatile storage and non-volatile buffer memory;
[0010] Figure 3 illustrates an illustrative general flow associated with recovering the computing system having a memory hierarchy;
[0011] Figure 4 illustrates a flowchart of a method for recovering from a computer system failure in which volatile memory contents are lost;
[0012] Figure 5 illustrates a flowchart of a more specific method for recovery from a computer system failure in which volatile memory contents are lost in the context of the system being a transactional system;
[0013] Figure 6 illustrates a flowchart of a method to automatically identify the various pages that were in volatile memory at the time of the failure, and automatically identify transactions that were active at the time of the failure; and
[0014] Figure 7 illustrates a timeline of a hit record in the context of an illustrative analysis phase, redo phase and undo phase. DETAILED DESCRIPTION
[0015] According to embodiments described in this document, the failure recovery of a computing system is described. The computing system includes volatile memory that has contents lost during failure, non-volatile buffer memory that (along with volatile memory) contributes to the active memory of the computing system, and nonvolatile storage. Recovery occurs by identifying the pages that were in volatile memory at the time of the failure. For each of these pages, retrieval determines whether to retrieve the page into volatile memory from the nonvolatile buffer or storage, and then perform the retrieval. In embodiments in which access speeds for the nonvolatile buffer memory can be much faster than for storage, and in which many of the pages to be retrieved are retrieved from the nonvolatile buffer, the end-to-end retrieval time can be reduced. End-to-end recovery time includes failure recovery time as well as a restart or wake-up time to have the system resume to balance performance. First, some introductory discussion with respect to a computing system will be described with respect to Figure 1. Then, embodiments of recovery will be described with respect to Figures 2 through 7.
[0016] Computing systems today are increasingly taking a wide variety of forms. Computing systems can, for example, be handheld devices, appliances, laptop computers, desktop computers, mainframe computers, distributed computing systems, or even devices that have not conventionally been thought of as a computing system. In this specification and claims, the term "computing system" is broadly defined as including any device or system (or combination thereof) that includes at least one physical, tangible processor, and physical, tangible memory capable of holding instructions therein. computer executables that can be executed by the processor. Memory can take any form and can depend on the nature and shape of the computing system. A computing system may be distributed across a networked environment and may include multiple constituent computing systems.
[0017] As illustrated in Figure 1, a computing system 100 includes at least one processing unit 102 and memory 104. Memory 104 may be physical system memory, which may be volatile, non-volatile, or some combination of the two. The computing system 100 also includes memory or non-volatile storage 106. If the computing system is distributed, the processing capacity, memory and/or storage may be equally distributed. As used in this document, the term "module" or "component" may refer to software objects or routines that run on the computing system. The different components, modules, mechanisms and services described in this document can be implemented as objects or processes that run on the computer system (eg as separate threads).
[0018] In the following description, embodiments are described with reference to the acts that are performed by one or more computing systems. If such acts are implemented in software, one or more processors of the associated computer system performing the act directs the operation of the computer system in response to having the computer-executable instructions executed. An example of such an operation involves data manipulation. Computer executable instructions (and manipulated data) may be stored in memory 104 and/or storage 106 of computer system 100. Computer system 100 may also contain communication channels 108 that allow computer system 100 to communicate with other message processors via, for example, the network 110.
[0019] Embodiments described in this document may comprise or utilize a special-purpose or general-purpose computer including hardware, such as, for example, one or more processors and system memory, as discussed below in greater detail. Embodiments described herein also include computer program products in the form of one or more physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media may be any available media that can be accessed by a general-purpose or special-purpose computer system. Computer-readable media that store computer-executable instructions are physical storage media. Computer-readable media that carry computer-executable instructions are transmission media. Thus, by way of example and not limitation, embodiments of the invention may comprise at least two distinctly different types of computer readable media: computer storage medium and transmission medium.
[0020] Computer storage media includes RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other tangible medium that can be used to store media. desired program code in the form of computer-executable instructions or data structures that can be accessed by a general-purpose or special-purpose computer.
[0021] A "network" is defined as one or more data links that allow the transport of electronic data between computer systems and/or modules and/or other electronic devices. When information is transferred or provided over a network or other communications connection (whether physically connected, wireless, or a combination of physically connected and wireless) to a computer, the computer appropriately views the connection as a transmission medium. The transmission media may include a network and/or data links that can be used to transport or media desired program code in the form of instructions or computer-executable data structures that can be accessed by a general-purpose computer. or special purpose. Combinations of the above must also be included within the scope of computer readable media.
[0022] Additionally, by reaching the various components of the computer system, the program code medium in the form of instructions or computer-executable data structures can be automatically transferred from the transmission medium to the computer's storage medium. (and vice versa). For example, computer-executable instructions or data structures received over a network or data link may be placed in temporary memory in RAM within a network interface module (e.g., a "NIC"), and then eventually transferred to computer system RAM and/or to less volatile computer storage media in a computer system. Thus, it should be understood that the computer's storage medium may be included in computer system components that also (or even primarily) use transmission medium.
[0023] Computer executable instructions comprise, for example, instructions and data that, when executed on a processor, cause a general-purpose computer, special-purpose computer, or special-purpose processing device to perform some function or group of tasks. functions, such as the functions described in this document. Computer executable instructions can be, for example, binaries, instructions in an intermediate format such as assembly language, or even source code. Although the subject has been described in language specific to structural aspects and/or methodological acts, it is to be understood that the subject defined in the appended claims is not necessarily limited to the aspects or acts described above. Rather, the aspects and acts described are disclosed as illustrative ways of implementing the claims.
[0024] Those skilled in the art will appreciate that the invention can be practiced in a network computing environment with various types of computer system configurations, including, personal computers, desktop computers, laptop computers, message processors, handheld devices, multiprocessor systems, microprocessor-based or programmable electronics, network PCs, minicomputers, mainframe computers, cell phones, PDAs, pagers, routers, switches, among others. The invention may also be practiced in distributed system environments where computer systems, both local and remote, which are linked (either by physically connected data links, wireless data links, or by a combination of physically connected data links). wirelessly) over a network, both perform tasks. In a distributed system environment, program modules can be located on both local and remote memory storage devices.
[0025] Figure 2 illustrates a hierarchy of memory 200 that includes volatile memory 201, nonvolatile storage 203, and nonvolatile buffer 202. Nonvolatile memory 202 and nonvolatile storage 203 are shown with thicker dashed borders symbolizing that your content will most typically survive the energy loss. However, the contents of volatile memory 201 rely on energy in order to renew its contents and thus, the contents of volatile memory 201 will be lost upon a loss of power. For example, if memory hierarchy 200 were present in computing system 100 of Figure 1, volatile memory 201 would be an example of the volatile part of memory 104 of Figure 1, buffer non-volatile memory 202 would be an example of the non-volatile part of memory 104 of Figure 1, and non-volatile storage 203 would be an example of storage 106 of Figure 1.
[0026] Buffer nonvolatile memory 202 may include a single level in memory hierarchy 200. However, in some embodiments, there may be more than one level of buffer nonvolatile memory in memory hierarchy 200 as represented by ellipses 204. By For example, there may be a portion of non-volatile memory 202 that has faster access speeds (i.e., higher in the memory hierarchy) than another portion of nonvolatile memory 203 that has slower access speeds (i.e., lower in the memory hierarchy). memory hierarchy).
[0027] The volatile memory 201 and the non-volatile memory 202 may together comprise the system memory 211 of the computing system, the system memory 211 including the range of addresses which are addressable by the processor (processors) of the computing system. System memory 211 includes working set 210 of pages that are most frequently actuated by the computing system. The working set 210 represents the data and instructions that are actively being used by the computing system in order to perform its current function. The working set 210 has the characteristic of having more random accesses. Volatile memory 201 has efficient random access speed (hence the term "Random Access Memory" or "RAM"). However, non-volatile memory 202 is also efficient at random access, especially compared to storage 203, which is better suited for sequential access. Thus, non-volatile memory 202 and volatile memory 201 together act as one large random access memory, and together they provide the ability to work with a much larger working set 210 than would be possible with volatile memory 201 alone.
[0028] As an example, the non-volatile buffer memory 202 may be storage class memory, such as a solid state disk. The term "storage class memory" is known in the art, and this description incorporates the known definition of the term. A storage class memory has the following properties:
[0029] 1. The memory is solid state;
[0030] 2. Memory is randomly accessible;
[0031] 3. Memory has lower latency than mechanical disk;
[0032] 4. Memory has higher I/O throughput rate than mechanical disk due to random access being a solid state electronic process without mechanical movement of the disk head.
[0033] In addition, storage class memory is non-volatile when used as non-volatile memory 202 of Figure 2.
[0034] A solid-state disk is a type of storage-class memory and is distinguished from a mechanical disk in that it is a solid-state device. Solid state disk additionally has the following properties that may also be included in some, but perhaps not all, of the other types of storage class memory that can be used with the principles described in this document:
[0035] 1. Fine-grained random access.
[0036] 2. Larger capacities than DRAM (capacity is in the order of that of magnetic disk)
[0037] 3. Higher transistor densities than DRAM, more storage per unit area (and volume).
[0038] 4. Lower power consumption and dissipation than rotating medium and DRAM.
[0039] 5. Typically, no Direct Memory Access between SSD and disk. Instead, data has to flow through DRAM to get to disk.
[0040] Other types of storage class memory include Phase Change Memory (PCM), Ferrous Oxide, and Memristor, which potentially have lower latencies and better access granularities than storage class disks. However, the principles described in this document are also not limited to currently existing storage-class memory technology, and may be extended to apply to future-developed storage-class memory technology or to any second-level memory other than memory as well. storage class.
[0041] Referring again to Figure 2, in the embodiments described in this document, the memory hierarchy operates on data segments referred to as "pages". In this description and claims, a "page" is defined as any group of data that is exchanged as an entire entity between system memory 211 and storage 203, and/or between volatile memory 201 and nonvolatile memory 202. Additionally, although not required, system memory 211 may have a scaling and eviction mechanism whereby pages that are used more frequently tend to be loaded higher in the memory hierarchy, and pages that are used less frequently tend to be dumped to lower levels in the memory hierarchy. In some cases, when a page is read from nonvolatile memory 202 into volatile memory 201, a copy of the page is retained within nonvolatile memory 202. Additionally, in some cases, when a page is read from storage 203 into nonvolatile memory 202 or volatile memory 201, a copy of the page is retained within storage 203.
[0042] Figure 3 illustrates an illustrative general flow 300 associated with recovering the computing system 100 having the memory hierarchy 200. Upon initialization (initialization 301) after a loss of power, the computing system enters a phase of initialization 302, followed by a parse phase 311, followed by a redo phase 312, and in a transactional system, followed by an undo phase 313.
[0043] At this point, it is observed that conventional recovery algorithms also use an analysis, recovery and redo phase. For example, Algorithms for Retrieval and Isolation Exploring Semantics (hereinafter, "ARIES") is an algorithm that includes such phases. However, ARIES is designed for database recovery in an environment in which system memory is fully rebuilt by reading pages from storage back into memory. The principles described in this document are built on top of ARIES and reduce retrieval time by restoring pages into volatile memory 201 from nonvolatile memory 202 as well as from storage 203. In fact, if processing normal push is further modified to include snapshots (also referred to as "checkpoints) within non-volatile memory 202, the recovery time can be further reduced especially if snapshots to nonvolatile memory 202 are frequent, and more than snapshots to storage 203. This checkpoint creation is an optimization that reduces the size of the hit record. Thus, creating checkpoints reduces the number of redo and undo actions that have to be performed during disaster recovery.
[0044] Figure 3 will be referred to frequently when describing additional recovery details below. Figure 3 is a general diagram, however more specific details regarding which functions can be performed in which phase will be outlined below in further detail. In retrieval described below, normal processing is modified to include two types of snapshots, a less frequent snapshot for storage 203, and a new, more frequent snapshot for nonvolatile memory 202. Additionally, the analysis phase of the ARIES algorithm is modified to formulate the analysis phase 311 of Figure 3. Furthermore, the redo phase of the ARIES algorithm is modified to formulate the redo phase 312 of Figure 3.
[0045] Figure 4 illustrates a flowchart of a method 400 for recovery from a computer system failure in which volatile memory contents are lost. Method 400 may be performed in the context of computing system 100 having memory hierarchy 200, and thus, method 400 will be described with frequent reference to Figures 1 and 2. Additionally, retrieval may follow the general flow 300 of Figure 3 and thus, method 400 will also be described with frequent reference to Figure 3.
[0046] Method 400 involves automatically identifying pages that were in volatile memory at the time of failure (act 401). These identified pages will also be the pages that are to be retrieved back into volatile memory 201 in order for the system to recover. With reference to Figure 2, remember that the contents of non-volatile memory 202 and non-volatile memory 203 are maintained regardless of the loss of power to the computing system. However, volatile memory 201 requires power in order to maintain its contents. As a result, all pages that were in volatile memory 201 are lost when power is lost. Referring to Figure 3, identification of the pages that were in volatile memory at the time of failure can be performed during the analysis phase 311 in the specific examples provided below.
[0047] For each of these pages that were in volatile memory at the time of the failure, the contents of box 410 are executed. Specifically, the system automatically determines a source from a page retrieval version (act 411). In other words, the system determines whether to retrieve the page from storage 203 or non-volatile memory 202 of the computing system. The retrieval source could be storage 203 as in conventional retrieval mechanisms such as ARIES retrieval.
[0048] However, unlike conventional retrieval mechanisms, the retrieval source can also be non-volatile buffer 202. If there are multiple memory levels of non-volatile buffer 202, the system can also determine from which level of non-volatile buffer memory 202 load the page in cases where the retrieval version is located in the buffer non-volatile memory 202. Referring to Figure 3, the retrieval source identification can be performed in the parse phase 311 in the example specific provided below.
[0049] For each page to be retrieved, the page is then loaded from the retrieval source into volatile memory (act 412). For example, if the page retrieval version was located in non-volatile memory 202, the page would be loaded into volatile memory 201 from nonvolatile memory 202. On the other hand, if the page retrieval version were located in storage 203, the page would be loaded into volatile memory 201 from storage 203. In some cases, during normal operation, when a page is read from nonvolatile memory 202 into volatile memory 201, a copy of the page is retained within nonvolatile memory 202. In this case, most of the retrieved pages can be retrieved from nonvolatile memory 202 as opposed to storage 203. Thus, since random access speeds from nonvolatile memory 202 are very faster than from 203 storage, this significantly speeds up recovery time. Referring to Figure 3, loading pages from non-volatile memory 202 or from storage 203 may be part of the redo phase 312 in the specific example provided below.
[0050] Optionally, the system builds a mapping (act 413) that identifies a location of a retrieval version of each page within the non-volatile buffer memory to each page that has a retrieval version within the buffer non-volatile memory. Referring to Figure 3, this can be performed during initialization phase 302. Loading the page from the recovery source (act 412) uses this mapping to find the location of each recovery version for pages whose recovery version is located within the non-volatile buffer memory. Once all pages that were present in volatile memory 201 before the failure are loaded from their respective recovery sources back into volatile memory 201, volatile memory 201 recovers the pages it had at the time of the failure.
[0051] Figure 5 illustrates a flowchart of a more specific method 500 for recovering from a computer system failure in which volatile memory contents are requested in the context of the system being a transactional system. Method 500 is a specific example of Method 400 in Figure 4, but includes some additional acts. Accordingly, acts 401, 410, 412 and 413 of method 400 are again illustrated within Figure 5.
[0052] In a transactional system, the system determines which transactions were prepared at the time of failure (act 511). Referring to Figure 3, the identification of pending transactions can be performed during the analysis phase 311 in the example below. The system then prepares each of the transactions that were prepared at the time of the failure (act 512) before loading pages from the recovery source into volatile memory (act 412). Referring to Figure 3, transaction preparation can be performed as part of the analysis phase 311.
[0053] There may be transactions that may have been confirmed. Changes to these committed transactions were written to the hit log, but the affected pages may not have been sent to storage 203. After the parse phase, during the redo phase, the old versions of these pages are read back into the volatile memory 201 with old and not updated data. The redo phase reapplies changes to the page starting from the penultimate checkpoint for this particular page. At the end of the redo phase, all changes would have been committed to these pages, but changes to uncommitted transactions are also applied to these pages. Hence the need for an undo phase that rolls back changes from uncommitted transactions bringing all active pages before the system crash back to a consistent state.
[0054] For transactions that were active (ie, uncommitted) at the time of failure, these transactions must fail due to failure in order to bring the system to a transactionally consistent state. As a result, the system identifies transactions that were active at the time of the failure (act 521). Referring to Figure 3, this identification of active transactions may occur as part of the analysis phase 311 in the specific example provided below. Then, the system undoes all actions of each of the active transactions (act 522) after all pages are retrieved into volatile memory (act 412). Referring to Figure 3, the undo of such transactions may be performed as part of the undo phase 313 in the specific example provided herein.
[0055] Figure 6 illustrates a flowchart of a method 600 to automatically identify the various pages that were in volatile memory at the time of the failure, and automatically identify transactions that were active at the time of the failure. Method 600 represents an example of act 401 of Figures 4 and 5 (in case of identifying pages that were in volatile memory at the time of failure) and act 521 of Figure 5 (in case of identifying active transactions at the time of failure). Method 600 may be performed during analysis phase 311 in the specific example provided below.
[0056] The system identifies the last occurrence record sequence number released to an occurrence record in non-volatile memory (act 601), and then sequentially inspects the occurrence record entries of the occurrence record from this last occurrence number. sequence of occurrence records (act 602). Based on the analysis of hit log entries, the system identifies pages that were in volatile memory at the time of the failure (act 603), and also identifies transactions that were active at the time of the failure (act 604).
[0057] Figure 7 illustrates the three phases including the analysis phase 720, the redo phase 730, and the undo phase 740 in the context of a hit record timeline 710. The analysis phase 720, the redo 730 and undo step 740 of Figure 7 are examples of the analysis step 311, redo step 312, and undo step 313, respectively, of Figure 3.
[0058] As the computing system operates normally, the computing system maintains a record of the occurrence of significant events. For example, in Figure 7, among other things, the computing system has recorded the occurrence log of the start of active transactions, when checkpoints occur, when a page becomes dirty (that is, is written to system memory 211 without be written to storage 203), and so on. When an event is recorded in the hit log, the event is given some sort of ID from which the order of events can be derived based on the assigned ID. For example, the ID might be a hit record sequence number where the hit record sequence number is incremented for each recorded event. Thus, events that are later in the hit record have a higher sequence number of hit records than events that are earlier in the hit record.
[0059] In the illustrative embodiment of Figure 7, the computing system also writes some items of information within a part of the storage that is accessible during boot time. For example, information can be written inside the computer system's boot block. The information includes the sequence number of last checkpoint hit records, as well as a globally unique identifier of the last page flushed from system memory 211 to storage 203. The page file includes all pages included within the non-volatile memory 202. The globally unique identifier is changed each time a page in the pagefile is changed (for example, each time the pagefile is created, deleted, formatted with a new version, or any other change) in non-volatile memory 202.
[0060] In this embodiment, the parse phase 720 has several functions including identifying 1) at which point in the hit log the redo phase 730 should begin, 2) which pages need to be redone (i.e. loaded into volatile memory 201). ), 3) which transactions were active at the time of the failure, and 4) which transactions were prepared at the time of the failure.
[0061] In parse phase 720, the hit log is scanned to identify pages that need to be loaded into nonvolatile memory. To do this, the analysis first determines where to start scanning the hit record moving forward. The globally unique identifier of the pagefile in nonvolatile memory 202 is compared to the globally unique identifier of the last pagefile released to storage 203. If there is no match, then the hit record is scanned from the second to last snapshot to the storage 203, much like it would run during normal ARIES recovery.
[0062] However, if there is a match, it means that the page file within non-volatile memory 202 is valid. Consequently, non-volatile memory snapshots 202 can be used to perform recovery. Consequently, forward scanning (to identify which pages are to be loaded into volatile memory 201) is started on the penultimate snapshot to nonvolatile memory 202. So, in this case, since the snapshots to nonvolatile memory 202 are performed more often during normal forward processing, this means that the hit record can be scanned from a much later sequence number of hit records. The facts that less of the hit record needs to be scanned due to the more frequent snapshots to nonvolatile memory 202, and that more pages to be retrieved into volatile memory 201 can be acquired from the nonvolatile memory device of faster access, this recovery is much faster. Note that while the snapshot from nonvolatile memory can be used, it is possible (but rare due to the relative frequency of nonvolatile memory 202 checkpoints) that the checkpoint to storage 203 will be used if it occurs later to any checkpoint for non-volatile memory 202.
[0063] As part of the analysis phase 720, processing normally associated with the analysis phase 720 may also be performed. For example, active transactions can be identified, the starting point for the redo phase 730 is identified, and transactions that were active or prepared at the time of failure are identified.
[0064] At this point, all transactions in the system are added to the list of active transactions in preparation for forward scanning of the hit record from the LSN right after this checkpoint 712. In addition, a table of dirty pages is constructed, which is initially empty at the start of the scan. The ultimate endpoint of the dirty page table is that the dirty page table includes all pages whose minimum sequence number of hit records is greater than the sequence number of hit records from the last flush to the record. instance (ie MinRecoveryLSN) that is read from the boot block.
[0065] The dirty page table also indicates where the latest version of the page is located, whether it is in non-volatile memory 202 or if it is in storage 203.
[0066] In the redo phase 730, all prepared transactions identified in the analysis phase 720 are prepared first. Then, all pages starting at the oldest dirty page in the dirty page table (represented by element 714 in Figure 7) are loaded from the appropriate location in parse phase 720. This can be performed according to the normal redo phase. of an ARIES retrieval, except that now, the appropriate source for each page to be loaded into volatile memory may be storage 203, but probably more often, nonvolatile memory 202. Again, since randomly accessible loads from the 202 non-volatile memory are much faster, this speeds up recovery significantly. Conventional recovery mainly involves random hits to retrieve pages from the disk medium and sequential scans of the hit log and journal files. In contrast, using the principles described in this document, since most or all of the working page sets are likely to be in nonvolatile RAM, page requests during retrieval are more likely random access into non-volatile solid-state memory. volatile. Such solid-state memory is much better at handling random-access page requests than disk storage. Consequently, the redo phase is much more efficient using the principles described in this document.
[0067] In the undo phase 740, hit record 710 is reverse scanned from the end of the hit record (element 716) all the way back to the beginning of the oldest active transaction (element 711). For any registered actions that are part of an active transaction, those actions are undone.
[0068] Thus, an effective, efficient and fast mechanism for recovery from computer system failure has been described. The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention, therefore, is indicated by the appended claims rather than by the preceding description.

权利要求:
Claims (4)
[0001]
1. Method for recovering from a failure of a computing system (100) in which the contents of a volatile memory (201) is lost, implemented in a computing system (100) that includes volatile memory (201), non-volatile storage (203) and non-volatile buffer memory (202), wherein the buffer non-volatile memory (202) together with the volatile memory (201) contributes to an active memory of the computing system (100), where the access speed for the intermediate non-volatile memory (202) is faster than for non-volatile storage (203), the method characterized by the fact that it comprises the following steps: while the system operates normally: taking checkpoints from the non-volatile storage and the Intermediate non-volatile memory, where checkpoints are taken more frequently for non-volatile memory and less frequently for non-volatile storage; maintain a record of significant events, comprising at least a record of the start of active transactions, record when checkpoints occur, record when a page is written to system memory without being written to non-volatile storage; and assigning, when an event is recorded to the record, an ID for the event in which the order of events can be derived based on the assigned ID, the ID being a sequence number of the record in which the sequence number of the record is incremented for each registered event; write, to a portion of storage accessible during boot time, information comprising the record sequence number of a last checkpoint and a globally unique identifier of the last paging file released from active memory to nonvolatile storage, in that the paging file includes all pages included in the nonvolatile buffer memory; change, each time a page in the page file is changed, the globally unique identifier in the non-volatile buffer memory; and the method further comprising, in the event of a computer system failure (100): automatically identifying a plurality of pages that were in volatile memory at the time of the crash, the act comprising: identifying the last record sequence number released to a record in nonvolatile storage and then sequentially reviewing the log entries from the record from that last released record sequence number and, based on an analysis of the log entries, identifying the pages that were in volatile memory at the time of the crash and also identifying active transactions at the time of crash; for each of the multiple pages that were in volatile memory at the time of the failure, perform the following: automatically determine a recovery source from a page recovery version, where the recovery source is nonvolatile storage or nonvolatile memory buffer, comprising: comparing the globally unique identifier of the paging file in the non-volatile buffer with the globally unique identifier of the last paging file released to non-volatile storage; if there is no match, scan the penultimate checkpoint record for nonvolatile storage and determine nonvolatile storage as the source of recovery; if there is a match, determine the buffer non-volatile memory checkpoint as valid and determine the buffer non-volatile memory as the recovery source; and loading the page from the given retrieval source into volatile memory.
[0002]
2. Method according to claim 1, characterized in that the automatically determine step results in a determination that most of the various pages have a retrieval source corresponding to the non-volatile buffer memory.
[0003]
3. Method, according to claim 1, characterized in that it further comprises: automatically building a mapping that identifies a location of a recovery version of each page within the non-volatile buffer memory for each page that has a recovery version within the non-volatile buffer memory.
[0004]
4. Method according to claim 1, characterized by the fact that the non-volatile intermediate memory has more than one memory level.

类似技术:

公开号 | 公开日 | 专利标题

BR112016007142B1|2022-01-25|Method for recovering from a computer system failure in which the contents of volatile memory are lost

US20210026837A1|2021-01-28|Persistent memory management

US20190251067A1|2019-08-15|Snapshots for a non-volatile device

US10289545B2|2019-05-14|Hybrid checkpointed memory

JP6556911B2|2019-08-07|Method and apparatus for performing an annotated atomic write operation

CN105843551B|2020-09-15|Data integrity and loss resistance in high performance and large capacity storage deduplication

US9342256B2|2016-05-17|Epoch based storage management for a storage device

Bailey et al.2013|Exploring storage class memory with key value stores

US11048676B2|2021-06-29|Trees and graphs in flash memory

Zheng et al.2016|HMVFS: A hybrid memory versioning file system

US11030092B2|2021-06-08|Access request processing method and apparatus, and computer system

Son et al.2017|SSD-assisted backup and recovery for database systems

Zhang et al.2017|A cost-efficient nvm-based journaling scheme for file systems

US10146616B1|2018-12-04|Cache based recovery of corrupted or missing data

Wei et al.2016|Extending SSD lifetime with persistent in-memory metadata management

Chen et al.2017|UDORN: A design framework of persistent in-memory key-value database for NVM

Zhang et al.2015|Write-combined logging: An optimized logging for consistency in NVRAM

Lee et al.2014|Last block logging mechanism for improving performance and lifetime on SCM-based file system

US20200272424A1|2020-08-27|Methods and apparatuses for cacheline conscious extendible hashing

Harrison2015|The End of Disk? SSD and In-Memory Databases

Banikazemi et al.2012|Eucalyptus: Support for effective use of persistent memory

同族专利:

公开号 | 公开日

US20170132071A1|2017-05-11|

WO2015065863A2|2015-05-07|

US20150121126A1|2015-04-30|

US10437662B2|2019-10-08|

WO2015065863A3|2015-07-09|

EP3063631B1|2017-08-09|

US9558080B2|2017-01-31|

CN105706061B|2019-01-08|

CN105706061A|2016-06-22|

BR112016007142A2|2017-08-01|

EP3063631A2|2016-09-07|

BR112016007142A8|2020-03-03|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

DE69126066T2|1990-06-29|1997-09-25|Oracle Corp|Method and device for optimizing logbook usage|

US5369757A|1991-06-18|1994-11-29|Digital Equipment Corporation|Recovery logging in the presence of snapshot files by ordering of buffer pool flushing|

US5918225A|1993-04-16|1999-06-29|Sybase, Inc.|SQL-based database system with improved indexing methodology|

US7188272B2|2003-09-29|2007-03-06|International Business Machines Corporation|Method, system and article of manufacture for recovery from a failure in a cascading PPRC system|

US20060059209A1|2004-09-14|2006-03-16|Lashley Scott D|Crash recovery by logging extra data|

US8452929B2|2005-04-21|2013-05-28|Violin Memory Inc.|Method and system for storage of data in non-volatile media|

US7765361B2|2006-11-21|2010-07-27|Microsoft Corporation|Enforced transaction system recoverability on media without write-through|

US8370562B2|2007-02-25|2013-02-05|Sandisk Il Ltd.|Interruptible cache flushing in flash memory systems|

US7904427B2|2008-01-11|2011-03-08|Microsoft Corporation|Lazier timestamping in a transaction time database|

US8639886B2|2009-02-03|2014-01-28|International Business Machines Corporation|Store-to-load forwarding mechanism for processor runahead mode operation|

US8250111B2|2009-02-27|2012-08-21|International Business Machines Corporation|Automatic detection and correction of hot pages in a database system|

US8712984B2|2010-03-04|2014-04-29|Microsoft Corporation|Buffer pool extension for database server|

US10430298B2|2010-10-28|2019-10-01|Microsoft Technology Licensing, Llc|Versatile in-memory database recovery using logical log records|

US9361044B2|2011-03-28|2016-06-07|Western Digital Technologies, Inc.|Power-safe data management system|

US8909996B2|2011-08-12|2014-12-09|Oracle International Corporation|Utilizing multiple storage devices to reduce write latency for database logging|

US9158700B2|2012-01-20|2015-10-13|Seagate Technology Llc|Storing cached data in over-provisioned memory in response to power loss|

US8984247B1|2012-05-10|2015-03-17|Western Digital Technologies, Inc.|Storing and reconstructing mapping table data in a data storage system|

US9442858B2|2012-07-13|2016-09-13|Ianywhere Solutions, Inc.|Solid state drives as a persistent cache for database systems|

US9423978B2|2013-05-08|2016-08-23|Nexgen Storage, Inc.|Journal management|

US9558080B2|2013-10-31|2017-01-31|Microsoft Technology Licensing, Llc|Crash recovery using non-volatile memory|US20150074456A1|2012-03-02|2015-03-12|Doe Hyun Yoon|Versioned memories using a multi-level cell|

US9558080B2|2013-10-31|2017-01-31|Microsoft Technology Licensing, Llc|Crash recovery using non-volatile memory|

US10402259B2|2015-05-29|2019-09-03|Nxp Usa, Inc.|Systems and methods for resource leakage recovery in processor hardware engines|

WO2016204529A1|2015-06-16|2016-12-22|한양대학교 산학협력단|Memory storage device and method for preventing data loss after power loss|

US10296418B2|2016-01-19|2019-05-21|Microsoft Technology Licensing, Llc|Versioned records management using restart era|

US9952931B2|2016-01-19|2018-04-24|Microsoft Technology Licensing, Llc|Versioned records management using restart era|

US9858151B1|2016-10-03|2018-01-02|International Business Machines Corporation|Replaying processing of a restarted application|

EP3364329A1|2017-02-21|2018-08-22|Mastercard International Incorporated|Security architecture for device applications|

CN108958959A|2017-05-18|2018-12-07|北京京东尚科信息技术有限公司|The method and apparatus for detecting hive tables of data|

KR20180126921A|2017-05-19|2018-11-28|에스케이하이닉스 주식회사|Data storage device and operating method thereof|

KR20190073768A|2017-12-19|2019-06-27|에스케이하이닉스 주식회사|Memory system and operating method thereof and data processing system including memory system|

US10776208B2|2018-07-18|2020-09-15|EMC IP Holding Company LLC|Distributed memory checkpointing using storage class memory systems|

US11188516B2|2018-08-24|2021-11-30|Oracle International Corproation|Providing consistent database recovery after database failure for distributed databases with non-durable storage leveraging background synchronization point|

US11099948B2|2018-09-21|2021-08-24|Microsoft Technology Licensing, Llc|Persistent storage segment caching for data recovery|

US20200125457A1|2018-10-19|2020-04-23|Oracle International Corporation|Using non-volatile memory to improve the availability of an in-memory database|

法律状态:
2020-03-17| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2021-11-16| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2022-01-25| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 27/10/2014, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US14/069,028|2013-10-31|

US14/069,028|US9558080B2|2013-10-31|2013-10-31|Crash recovery using non-volatile memory|

PCT/US2014/062313|WO2015065863A2|2013-10-31|2014-10-27|Crash recovery using non-volatile memory|

[返回顶部]